Job Title: Sr. Data Engineer
Job Duration: 6 months contract
Job Location: San Francisco, CA, Hybrid (San Fran)/2 days a week between Tues through Thurs
Our Client is looking for an experienced Data Engineer to support the Marketing Solutions execution of complex data development and database efficiency, with a deep focus on improving the data quality, accuracy and timeliness for marketing related products. This role will interact directly with Engineering teams to optimize data operations for product metrics to surface insights at terabyte scale, globally. As our data and business needs increase, we continue to iterate and evolve our processes, best-practice documentation, and deep-dive retrospectives to inform our go/no go decision making. Ultimately to be successful, the individual should feel comfortable advising a team of data professionals, navigating our tech stack, and recommending solutions to complex data problems. Our team is highly collaborative, open and honest, and brings a low-ego, high impact mindset to work!
Responsibilities:
Design, develop, and manage data pipelines and workflows to enable efficient and accurate data processing using Trino SQL/Spark SQL warehoused in HDFS datasets.
Effectively performs code designs and reviews/approves test cases.
Implement data quality checks and audits to maintain high data accuracy and integrity.
Produces elegant and efficient designs, high performance, and scalable code that allows for easy extension to future needs.
Collaborate with cross-functional teams, especially data engineering, to understand data requirements and implement robust data solutions.
Work closely with data domain experts to gather data requirements, translate business needs into technical specifications, and communicate data insights effectively for sales representative workflow efficiency.
Optimize data storage for performance and scalability, ensuring efficient data Extraction, Transformation and Load (ETL).
Develop and maintain documentation related to data pipelines, QA, metrics, and data policy as it relates to best practice, compliance and GDPR.
Stay up to date with industry best practices and emerging trends in data engineering and analytics, including Generative AI as it impacts our data operations.
Qualifications:
2+ years in using SQL and experience optimizing SQL databases for performance (Trino SQL, or Spark).
Demonstrated experience in managing data pipelines (like HDFS), data repository (like GitHub), workflows (like Apache Airflow), and ETL (best practice coding).
Ability to communicate complex technical concepts to both technical and non-technical individuals.
Experience working with multiple stakeholders, setting project priorities and delivering on Objectives and Key Results (OKRs).
Experience automating script changes in Python.
Preferred Qualifications:
BA/BS in engineering, computer science, or related technical field (such as statistics, or data science).
Excellent analytical skills, designing data workflows and analyzing data for anomalies, or setting data quality thresholds via automated solutions.
Familiarity with data governance principles
Program Manager experience
Demonstrated experience in managing data pipelines in HDFS
Experience running a scrum team and using Jira.
Spark SQL
Suggested Skills:
Data Analysis
Project Management
Data Engineering